33 research outputs found

    MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays

    Get PDF
    Affymetrix has recently developed whole-transcript GeneChips—‘Gene’ and ‘Exon’ arrays—which interrogate exons along the length of each gene. Although each probe on these arrays is intended to hybridize perfectly to only one transcriptional target, many probes match multiple transcripts located in different parts of the genome or alternative isoforms of the same gene. Existing statistical methods for estimating expression do not take this into account and are thus prone to producing inflated estimates. We propose a method, Multi-Mapping Bayesian Gene eXpression (MMBGX), which disaggregates the signal at ‘multi-match’ probes. When applied to Gene arrays, MMBGX removes the upward bias of gene-level expression estimates. When applied to Exon arrays, it can further disaggregate the signal between alternative transcripts of the same gene, providing expression estimates of individual splice variants. We demonstrate the performance of MMBGX on simulated data and a tissue mixture data set. We then show that MMBGX can estimate the expression of alternative isoforms within one experimental condition, confirming our results by RT-PCR. Finally, we show that our method for detecting differential splicing has a lower error rate than standard exon-level approaches on a previously validated colon cancer data set

    Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips

    Get PDF
    BACKGROUND: Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case. RESULTS: We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations. CONCLUSION: Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript

    A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6

    Get PDF
    Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs

    TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput genotyping microarrays assess both total DNA copy number and allelic composition, which makes them a tool of choice for copy number studies in cancer, including total copy number and loss of heterozygosity (LOH) analyses. Even after state of the art preprocessing methods, allelic signal estimates from genotyping arrays still suffer from systematic effects that make them difficult to use effectively for such downstream analyses.</p> <p>Results</p> <p>We propose a method, TumorBoost, for normalizing allelic estimates of one tumor sample based on estimates from a single matched normal. The method applies to any paired tumor-normal estimates from any microarray-based technology, combined with any preprocessing method. We demonstrate that it increases the signal-to-noise ratio of allelic signals, making it significantly easier to detect allelic imbalances.</p> <p>Conclusions</p> <p>TumorBoost increases the power to detect somatic copy-number events (including copy-neutral LOH) in the tumor from allelic signals of Affymetrix or Illumina origin. We also conclude that high-precision allelic estimates can be obtained from a single pair of tumor-normal hybridizations, if TumorBoost is combined with single-array preprocessing methods such as (allele-specific) CRMA v2 for Affymetrix or BeadStudio's (proprietary) XY-normalization method for Illumina. A bounded-memory implementation is available in the open-source and cross-platform R package <it>aroma.cn</it>, which is part of the Aroma Project (<url>http://www.aroma-project.org/</url>).</p

    Adsorption models of hybridization and post-hybridisation behaviour on oligonucleotide microarrays

    Full text link
    Analysis of data from an Affymetrix Latin Square spike-in experiment indicates that measured fluorescence intensities of features on an oligonucleotide microarray are related to spike-in RNA target concentrations via a hyperbolic response function, generally identified as a Langmuir adsorption isotherm. Furthermore the asymptotic signal at high spike-in concentrations is almost invariably lower for a mismatch feature than for its partner perfect match feature. We survey a number of theoretical adsorption models of hybridization at the microarray surface and find that in general they are unable to explain the differing saturation responses of perfect and mismatch features. On the other hand, we find that a simple and consistent explanation can be found in a model in which equilibrium hybridization followed by partial dissociation of duplexes during the post-hybridization washing phase.Comment: 26 pages, 6 figures, some rearrangement of sections and some additions. To appear in J.Phys.(condensed matter

    Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.</p> <p>Results</p> <p>Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.</p> <p>Conclusion</p> <p>The major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure.</p

    Unsupervised assessment of microarray data quality using a Gaussian mixture model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny.</p> <p>Results</p> <p>We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach.</p> <p>Conclusion</p> <p>This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.</p

    The Role of Particulate Matter-Associated Zinc in Cardiac Injury in Rats

    Get PDF
    Background: Exposure to particulate matter (PM) has been associated with increased cardiovascular morbidity; however, causative components are unknown. Zinc is a major element detected at high levels in urban air.Objective We investigated the role of PM-associated zinc in cardiac injury. Methods: We repeatedly exposed 12- to 14-week-old male Wistar Kyoto rats intratracheally (1×/week for 8 or16 weeks) to a) saline (control); b) PM having no soluble zinc (Mount St. Helens ash, MSH); or c) whole-combustion PM suspension containing 14.5 μg/mg of water-soluble zinc at high dose (PM-HD) and d ) low dose (PM-LD), e) the aqueous fraction of this suspension (14.5 μg/mg of soluble zinc) (PM-L), or f ) zinc sulfate (rats exposed for 8 weeks received double the concentration of all PM components of rats exposed for 16 weeks). Results: Pulmonary inflammation was apparent in all exposure groups when compared with saline (8 weeks greater than 16 weeks). PM with or without zinc, or with zinc alone caused small increases in focal subepicardial inflammation, degeneration, and fibrosis. Lesions were not detected in controls at 8 weeks but were noted at 16 weeks. We analyzed mitochondrial DNA damage using quantitative polymerase chain reaction and found that all groups except MSH caused varying degrees of damage relative to control. Total cardiac aconitase activity was inhibited in rats receiving soluble zinc. Expression array analysis of heart tissue revealed modest changes in mRNA for genes involved in signaling, ion channels function, oxidative stress, mitochondrial fatty acid metabolism, and cell cycle regulation in zinc but not in MSH-exposed rats. Conclusion: These results suggest that water-soluble PM-associated zinc may be one of the causal components involved in PM cardiac effects

    Clonogenic growth of human breast cancer cells co-cultured in direct contact with serum-activated fibroblasts

    Get PDF
    INTRODUCTION: Accumulating evidence suggests that fibroblasts play a pivotal role in promoting the growth of breast cancer cells. The objective of the present study was to characterize and validate an in vitro model of the interaction between small numbers of human breast cancer cells and human fibroblasts. METHODS: We measured the clonogenic growth of small numbers of human breast cancer cells co-cultured in direct contact with serum-activated, normal human fibroblasts. Using DNA microarrays, we also characterized the gene expression profile of the serum-activated fibroblasts. In order to validate the in vivo relevance of our experiments, we then analyzed clinical samples of metastatic breast cancer for the presence of myofibroblasts expressing α-smooth muscle actin. RESULTS: Clonogenic growth of human breast cancer cells obtained directly from in situ and invasive tumors was dramatically and consistently enhanced when the tumor cells were co-cultured in direct contact with serum-activated fibroblasts. This effect was abolished when the cells were co-cultured in transwells separated by permeable inserts. The fibroblasts in our experimental model exhibited a gene expression signature characteristic of 'serum response' (i.e. myofibroblasts). Immunostaining of human samples of metastatic breast cancer tissue confirmed that myofibroblasts are in direct contact with breast cancer cells. CONCLUSION: Serum-activated fibroblasts promote the clonogenic growth of human breast cancer cells in vitro through a mechanism that involves direct physical contact between the cells. This model shares many important molecular and phenotypic similarities with the fibroblasts that are naturally found in breast cancers
    corecore